<<<<<<< HEAD ======= >>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5

Introduction and Motivation

The five leading causes of death in the United States from 1999 to 2014 are cancer, heart disease, unintentional injury, chronic lower respiratory disease, and stroke. The dataset includes the U.S. Department of Health and Human Services public health regions. Therefore, we can investigate the leading causes of death of each region, and develop accordingly public health policy and remedies.

Data Descriptions

Approaches

Analysis and Visualizations

Loading Data

cod_data = read_csv("./data/NCHS_-_Potentially_Excess_Deaths_from_the_Five_Leading_Causes_of_Death.csv") %>%
  clean_names() %>%
  na.omit() %>%
  filter(!(state == "United States")) %>%
  separate(., percent_potentially_excess_deaths, into = c("percent_excess_death"), sep = "%") %>% 
  mutate(percent_excess_death = as.numeric(percent_excess_death), mortality = observed_deaths/population * 10000, mortality = as.numeric(mortality)) %>% 
  select(year, age_range, cause_of_death, state, locality, observed_deaths, population, expected_deaths, potentially_excess_deaths, percent_excess_death, mortality, hhs_region)
## Warning: Too many values at 191748 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9,
## 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...

##columns removed

 #"state_fips_code"                      "benchmark"   "potentially_excess_deaths" "percent_excess_death"      "mortality"   
  • the year variable contains data collected from 2005-2015.
  • filtered out United States in the state variable.
  • Added a variable mortality which is calculated by observed_deaths/population * 10000. This variable indicates the number of deathes observed in every 10000 people in the three geographic regions: Metropolitan, Nonmetropolitan and All.
region_cod_data = cod_data %>%
  select(state, locality, hhs_region, percent_excess_death) %>% 
  group_by(state,locality, hhs_region) %>% 
  summarise(mean_ped = mean(percent_excess_death)) %>% 
  dplyr::filter(!(state == "District of\nColumbia")) %>% 
  mutate(hhs_region = as.character(hhs_region))


region_lm = lm(region_cod_data$mean_ped~region_cod_data$hhs_region)
rl1 = broom::tidy(region_lm)
kable(rl1)
term estimate std.error statistic p.value
(Intercept) 22.897478 2.124541 10.777612 0.0000000
region_cod_data$hhs_region10 6.919893 3.302733 2.095202 0.0379936
region_cod_data$hhs_region2 -1.678888 4.456475 -0.376730 0.7069571
region_cod_data$hhs_region3 15.028044 3.161418 4.753577 0.0000050
region_cod_data$hhs_region4 27.459220 2.776844 9.888645 0.0000000
region_cod_data$hhs_region5 11.065534 2.962531 3.735162 0.0002743
region_cod_data$hhs_region6 23.938689 3.103091 7.714466 0.0000000
region_cod_data$hhs_region7 12.500192 3.302733 3.784802 0.0002292
region_cod_data$hhs_region8 5.138616 2.962531 1.734536 0.0850722
region_cod_data$hhs_region9 8.502314 3.302733 2.574327 0.0111054

region_locality_lm =lm(region_cod_data$mean_ped ~ region_cod_data$hhs_region+region_cod_data$locality)
rl2 = broom::tidy(region_locality_lm)
kable(rl2)
term estimate std.error statistic p.value
(Intercept) 21.518530 2.080719 10.3418699 0.0000000
region_cod_data$hhs_region10 6.592074 2.933587 2.2471039 0.0262563
region_cod_data$hhs_region2 -0.892122 3.959828 -0.2252931 0.8220919
region_cod_data$hhs_region3 15.098291 2.807614 5.3776243 0.0000003
region_cod_data$hhs_region4 27.131401 2.466649 10.9992936 0.0000000
region_cod_data$hhs_region5 10.737714 2.631517 4.0804272 0.0000765
region_cod_data$hhs_region6 23.610869 2.756320 8.5660831 0.0000000
region_cod_data$hhs_region7 12.172373 2.933587 4.1493141 0.0000587
region_cod_data$hhs_region8 4.810797 2.631517 1.8281458 0.0697354
region_cod_data$hhs_region9 8.174495 2.933587 2.7865190 0.0060956
region_cod_data$localityMetropolitan -2.159388 1.555863 -1.3879042 0.1674526
region_cod_data$localityNonmetropolitan 7.279690 1.582741 4.5994187 0.0000096

Interpretation: U.S. Department of Health and Human Services public health regions (1 through 10) are used as a categorical variable in the above regressions. Specific region classification is shown in the figure below.

<<<<<<< HEAD

[attach image here].

As shown in the above summary tables, with respect to region 1, only region 2 have negative estimated coefficient, indicating less mean percentage of excess death . Specifically, comparing with region 1, region 2 have 1.67% less mean percentage of excess death on average. From region 3 to region 10, the mean percentage of excess death is higher comparing with region 1. Similiar results yield from regression adjusted for Locality. Adjusting locality, region 4 and 6 have top two highest increase in mean percetage of excess death with respect to region 1. Adjusting for different regions, similar result yield that, on average, metropolitan have 2% less than overall mean percetage of excess deaths and nonmetropolitan have 7% more than overall mean percetage of excess death.

=======


As shown in the above summary tables, with respect to region 1, only region 2 have negative estimated coefficient, indicating less mean percentage of excess death . Specifically, comparing with region 1, region 2 have 1.67% less mean percentage of excess death on average. From region 3 to region 10, the mean percentage of excess death is higher comparing with region 1. Similiar results yield from regression adjusted for Locality. Adjusting locality, region 4 and 6 have top two highest increase in mean percetage of excess death with respect to region 1. Adjusting for different regions, similar result yield that, on average, metropolitan have 2% less than overall mean percetage of excess deaths and nonmetropolitan have 7% more than overall mean percetage of excess death.

>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5

Locality vs. Mortality

An Overview of the Data

cod_data %>%
  mutate(year = as.factor(year))
## # A tibble: 191,748 x 12
##      year age_range cause_of_death   state        locality observed_deaths
##    <fctr>     <chr>          <chr>   <chr>           <chr>           <int>
##  1   2005      0-49         Cancer Alabama             All             756
##  2   2005      0-49         Cancer Alabama    Metropolitan             556
##  3   2005      0-49         Cancer Alabama Nonmetropolitan             200
##  4   2005      0-49         Cancer Alabama             All             756
##  5   2005      0-49         Cancer Alabama    Metropolitan             556
##  6   2005      0-49         Cancer Alabama Nonmetropolitan             200
##  7   2005      0-49         Cancer Alabama             All             756
##  8   2005      0-49         Cancer Alabama    Metropolitan             556
##  9   2005      0-49         Cancer Alabama Nonmetropolitan             200
## 10   2005      0-54         Cancer Alabama             All            1346
## # ... with 191,738 more rows, and 6 more variables: population <int>,
## #   expected_deaths <int>, potentially_excess_deaths <int>,
## #   percent_excess_death <dbl>, mortality <dbl>, hhs_region <int>
p <- cod_data %>%
  plot_ly(
    x = ~expected_deaths, 
    y = ~observed_deaths, 
    size = ~population, 
    color = ~cause_of_death, 
    frame = ~hhs_region, 
    text = ~state, 
    hoverinfo = "text",
    type = 'scatter',
    mode = 'markers'
  ) %>% 
  layout(title = "Change of Standardized Mortality Ratio in National Public Health Regions")

p
<<<<<<< HEAD

Descriptive Analysis

=======

Discriptive Analysis

>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5
cod_data %>%
  group_by(cause_of_death) %>% 
  ggplot(aes(x = locality, y = mortality, fill = cause_of_death)) +
  geom_col() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title="Locality vs. Mortality") +
  labs(x="locality", y="mortality") 

This bar graph shows the distribution of cause of death within mortality in the three geographic regions: Metropolitan, Nonmetropolitan and All. We observe that the weight of each composition of cuases of death is the same regardless of the locality. Nonmetropolitan area seems to have the highest mortaility, the number of deathes observed in every 10000 people.

cod_data %>%
  group_by(cause_of_death) %>% 
  ggplot(aes(x = locality, y = mortality)) + geom_boxplot(aes(color = cause_of_death), na.rm = T) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  labs(title="Locality vs. Mortality") +
  labs(x="locality", y="mortality") 

We observe that the median of mortaility is the highest among all 5 causes of death, followed by heart disease, unintentional injury, heart disease, chronic lowee respiratory disease and stroke. This pattern remains consistant when we subdivide locaility into metropolitan region and nonmetrpolitan region, which is identical to our previous finding. All 5 cuases of death are right skewed regardless of the locality.

Linear Model

mortality_lm = lm(mortality ~locality, data = cod_data)
summary(mortality_lm)
## 
## Call:
## lm(formula = mortality ~ locality, data = cod_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.1454 -2.9547 -1.2983  0.9988 19.8959 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              4.10331    0.01672 245.474   <2e-16 ***
## localityMetropolitan    -0.22576    0.02372  -9.519   <2e-16 ***
## localityNonmetropolitan  1.14451    0.02438  46.942   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.317 on 191745 degrees of freedom
## Multiple R-squared:  0.01821,    Adjusted R-squared:  0.0182 
<<<<<<< HEAD
## F-statistic:  1778 on 2 and 191745 DF,  p-value: < 2.2e-16
rl3 = broom::tidy(mortality_lm)
kable(rl3)
term estimate std.error statistic p.value
(Intercept) 4.1033134 0.0167159 245.474203 0
localityMetropolitan -0.2257558 0.0237156 -9.519299 0
localityNonmetropolitan 1.1445051 0.0243812 46.942205 0
**interpretation on linear regression:* *
Here we obtain the linear regression mo del of morta lity = 4.103 - 0.226localityMetropolitan -1.145localityNonmetropolitan.
We expect to see 0.22576 l ess deaths fo r every 1000 0 people in t he metropolitan region as compared to all regions.
We expect to see 1.14451 m ore death for every 10000 people in th e nonmetropolitan region as compared to all regions.
======= ## F-statistic: 1778 on 2 and 191745 DF, p-value: < 2.2e-16

interpretation on linear regression: Here we obtain the linear regression model of mortality = 4.103 - 0.226localityMetropolitan -1.145localityNonmetropolitan. We expect to see 0.22576 less deaths for every 10000 people in the metropolitan region as compared to all regions. We expect to see 1.14451 more death for every 10000 people in the nonmetropolitan region as compared to all regions.

>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5

Mean Percent Excess Death vs. Locality

Data Visualization

Does mean percent excess death have significant difference between metro and non-metro groups?

gp_cod_data = cod_data %>%
  group_by(year, locality) %>% 
  summarise(mean_ped = mean(percent_excess_death)) 

renderPlotly({
gp_cod_data_Metro = gp_cod_data %>%
  group_by(year) %>%
  filter(locality == "Metropolitan")
gp_cod_data_nonMetro = gp_cod_data %>%
  group_by(year) %>%
  filter(locality == "Nonmetropolitan")
gp_cod = cbind(gp_cod_data_Metro, gp_cod_data_nonMetro)

p = gp_cod %>%
plot_ly( x = ~year, y = ~mean_ped, name = 'Metropolitan', type = 'scatter', mode = 'markers') %>%
  add_trace( y = ~mean_ped1, name = 'Nonmetropolitan', mode = 'markers') %>% 
  layout(yaxis = list(title = 'mean percent excess death (Metropolitan/Nonmetropolitan)'))
p
})
## function (...) 
## {
##     if (length(outputArgs) != 0 && !hasExecuted$get()) {
##         warning("Unused argument: outputArgs. The argument outputArgs is only ", 
##             "meant to be used when embedding snippets of Shiny code in an ", 
##             "R Markdown code chunk (using runtime: shiny). When running a ", 
##             "full Shiny app, please set the output arguments directly in ", 
##             "the corresponding output function of your UI code.")
##         hasExecuted$set(TRUE)
##     }
##     if (is.null(formals(origRenderFunc))) 
##         origRenderFunc()
##     else origRenderFunc(...)
## }
<<<<<<< HEAD
## <environment: 0x7faa0ae5aae0>
=======
## <environment: 0x7ffdcf4876d0>
>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5
## attr(,"class")
## [1] "shiny.render.function" "function"             
## attr(,"outputFunc")
## function (outputId, width = "100%", height = "400px", inline = FALSE) 
## {
##     htmlwidgets::shinyWidgetOutput(outputId = outputId, name = "plotly", 
##         width = width, height = height, inline = inline, package = "plotly")
## }
## <environment: namespace:plotly>
## attr(,"outputArgs")
## list()
## attr(,"hasExecuted")
## <Mutable>
##   Public:
##     clone: function (deep = FALSE) 
##     get: function () 
##     set: function (value) 
##   Private:
##     value: FALSE

This plot shows the gap of mean percent excess death for metropolitan and nonmetropolitan. Nonmetropolitan has a obviously larger mean percent excess death than metropolitan areas. In order to see if the difference is significant, we will run a regression analysis regarding mean excess death and locality.

Regression Analysis

gp_locality_lm = lm(mean_ped ~locality , data = gp_cod_data)
summary(gp_locality_lm)
## 
## Call:
## lm(formula = mean_ped ~ locality, data = gp_cod_data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5667 -1.0325 -0.3470  0.7922  3.2937 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              33.8866     0.4071  83.238  < 2e-16 ***
## localityMetropolitan     -2.0599     0.5757  -3.578   0.0012 ** 
## localityNonmetropolitan   8.0460     0.5757  13.975 1.13e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.35 on 30 degrees of freedom
## Multiple R-squared:  0.9198, Adjusted R-squared:  0.9145 
## F-statistic: 172.1 on 2 and 30 DF,  p-value: < 2.2e-16
rl4 = broom::tidy(gp_locality_lm)
kable(rl4)
term estimate std.error statistic p.value
(Intercept) 33.886633 0.4071070 83.237648 0.0000
localityMetropolitan -2.059931 0.5757363 -3.577907 0.0012
localityNonmetropolitan 8.045979 0.5757363 13.975110 0.0000
**interpretation on linear regression: **
The regression model is me an percent_e xcess_death = 33.89 - 2. 05*Metropolitan + 8.046Nonmetropolitan. For people in metropolitan, the expected percent excess death has a decrease of 2.06. For people in non-metropolitan aream the expected percent excess death has an increase of 8.046. The p-values show these predictors are both significant. The adjusted R squared is 0.91, which indicates the regression model explains 91% of variation in mean percent excess death due to the variation in locality.

Mean Percent Excess Death vs. Causes of Death

How should we help nonmetropolitan areas improve public health?

cod_datayr2 = read_csv("./data/NCHS_-_Potentially_Excess_Deaths_from_the_Five_Leading_Causes_of_Death.csv") %>%
  clean_names() %>%
  na.omit() %>%
  filter(!(state == "United States")) %>%
  separate(., percent_potentially_excess_deaths, into = c("percent_excess_death"), sep = "%") %>%
  mutate(percent_excess_death = as.numeric(percent_excess_death), mortality = observed_deaths/population * 10000, mortality = as.numeric(mortality)) %>%
  select(year, cause_of_death, state, benchmark, locality, observed_deaths, population, expected_deaths, potentially_excess_deaths, percent_excess_death, mortality)
## Warning: Too many values at 191748 locations: 1, 2, 3, 4, 5, 6, 7, 8, 9,
## 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...
cod_datayr2 %>%
  filter(benchmark == "2010 Fixed")%>%
  group_by(cause_of_death,year, locality)%>%
  mutate(year_percent_death=mean(percent_excess_death))%>%
  distinct(year, cause_of_death, locality, .keep_all = TRUE)%>%
  ungroup(year, locality)%>%
ggplot(aes(x = year, y = year_percent_death, color = cause_of_death))+
  geom_point(size = 1)+
  geom_line(size = 1) +
  facet_grid(.~locality)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  scale_x_continuous( breaks=seq(2005,2015,1))+
  labs(title = "Average yearly excess death percentage for different locality")

interpretation on linear regression: As can be seen in the plot, if we stratify by cause, we will find some increasing pattern in subgroups(unintentional injury) in certain period of time(2010-2015). In nonmetropolitan regions, unintentional injury and chronic lower respiratory disease have on average the highest excess death percentage in nonmetropolitan areas and needs more attention from public health experts.

cod_datayrlm = cod_datayr2 %>%
  filter(locality == "Nonmetropolitan",
         benchmark == "2010 Fixed")
lmpop = lm(data=cod_datayrlm, percent_excess_death~year+cause_of_death)
summary(lmpop)
## 
## Call:
## lm(formula = percent_excess_death ~ year + cause_of_death, data = cod_datayrlm)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -55.456  -9.828   0.576  10.404  39.489 
## 
## Coefficients:
##                                                  Estimate Std. Error
## (Intercept)                                     769.06335   67.48938
## year                                             -0.36941    0.03358
## cause_of_deathChronic Lower Respiratory Disease  27.06490    0.33901
## cause_of_deathHeart Disease                      13.90257    0.32844
## cause_of_deathStroke                             12.91441    0.33784
## cause_of_deathUnintentional Injury               31.18470    0.32812
##                                                 t value Pr(>|t|)    
## (Intercept)                                       11.39   <2e-16 ***
## year                                             -11.00   <2e-16 ***
## cause_of_deathChronic Lower Respiratory Disease   79.83   <2e-16 ***
## cause_of_deathHeart Disease                       42.33   <2e-16 ***
## cause_of_deathStroke                              38.23   <2e-16 ***
## cause_of_deathUnintentional Injury                95.04   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.92 on 19718 degrees of freedom
## Multiple R-squared:  0.365,  Adjusted R-squared:  0.3649 
## F-statistic:  2267 on 5 and 19718 DF,  p-value: < 2.2e-16
rl5 = broom::tidy(lmpop)
kable(rl5)
term estimate std.error statistic p.value
(Intercept) 769.0633548 67.4893807 11.39532 0
year -0.3694128 0.0335766 -11.00209 0
cause_of_deathChronic Lower Respiratory Disease 27.0649028 0.3390124 79.83454 0
cause_of_deathHeart Disease 13.9025703 0.3284411 42.32896 0
cause_of_deathStroke 12.9144132 0.3378397 38.22645 0
cause_of_deathUnintentional Injury 31.1847036 0.3281228 95.03972 0

From the linear model based on 2010 fixed benchmark, we can tell that controlling for cause of death, the excess death percentage is expected to decrease by 0.37 each year. Overall, the excess death percentage shows a decreasing pattern with time. Besides, in the same year, compared with cancer, other 4 causes have significantly higher excess death percentage. Herein unintentional injury and chronic lower respiratory disease have on average the highest excess death percentage.

Mean Percentage of Excess Death vs. Five Leading Cause of Death

<<<<<<< HEAD

For each of the five cause of death, what is the mean percent of excess death? It is different across locality. As shown in the boxplots for all three regions, the rank of the five cause of death remain the same for different localities. For all five causes, the mean percent of death is lower for Metropolitan and higher for Nonmetropolotan.

=======

For each of the five cause of death, what is the mean percent of excess death? It is different across locality.

<<<<<<< HEAD As shown in the boxplots for all three regions, the rank of the five cause of death remain the same for different localities. For all five causes, the mean percent of death is lower for Metropolitan and higher for Nonmetropolotan.

>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5

cod_data %>%
  select(state, locality, percent_excess_death, hhs_region, cause_of_death) %>%
  filter(locality != "All") %>% 
  mutate(hhs_region = as.factor(hhs_region)) %>% 
  group_by(cause_of_death, locality) %>% 
  summarise(mean_ped = mean(percent_excess_death)) %>% 
  ungroup() %>% 
  ggplot(aes(x = cause_of_death,  y = mean_ped, fill = cause_of_death)) +
  geom_col() +
  facet_grid(~locality) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) 
<<<<<<< HEAD

=======

<<<<<<< HEAD

>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5

For the above graph, if we seperate the effect of the gap between urban and rural area into different HHS region(Figure “above”) as mentioned, the following chart gives detailed information.

cod_data %>%
  select(state, locality, percent_excess_death, hhs_region, cause_of_death) %>%
  filter(locality != "All") %>% 
  mutate(hhs_region = as.factor(hhs_region)) %>% 
  group_by(cause_of_death, locality,hhs_region) %>% 
  summarise(mean_ped = mean(percent_excess_death)) %>% 
  ggplot(aes(x = hhs_region,  y = mean_ped, fill = hhs_region)) +
  geom_col() +
  facet_grid(~cause_of_death) +
  theme(axis.text.x = element_text(angle = 15)) +
    labs(
      y = "Mean %Excess Death",
      x = "HHS Regions",
      fill = "HHS Region"
    ) 

In order to take the effect of Localtiy into account, Flexdashboard with Shiny will be helpful. the link to the Shiny page is Here - “Comparison of Difference in Mean %Excess Death of Five Leading Causes of Death in each HHS regions between Metropolitan and Nonmetropolitan Area”.

<<<<<<< HEAD

Regress percent of excess death on cause of death, which is a categorical variable. with respect to cancer, the least of the five causes, all 4 have additional percent of death as evaluated by the estimates. The results give statisically significant estimates .

interpretation on linear regression: yz3306:Regress percent of excess death on cause of death, which is a categorical variable. with respect to cancer, the least of the five causes, all 4 have additional percent of death as evaluated by the estimates. The results give statisically significant estimates.

cod_lm_yz1 = lm(percent_excess_death ~cause_of_death, data = cod_data)
rl6 = broom::tidy(cod_lm_yz1)
kable(rl6)
term estimate std.error statistic p.value
(Intercept) 21.60686 0.0839908 257.25277 0
cause_of_deathChronic Lower Respiratory Disease 21.08880 0.1209180 174.40579 0
cause_of_deathHeart Disease 10.89773 0.1187990 91.73246 0
cause_of_deathStroke 12.92196 0.1204678 107.26483 0
cause_of_deathUnintentional Injury 25.77100 0.1187583 217.00389 0
=======

Regress percent of excess death on cause of death, which is a categorical variable. with respect to cancer, the least of the five causes, all 4 have additional percent of death as evaluated by the estimates. The results give statisically significant estimates .

interpretation on linear regression: yz3306:Regress percent of excess death on cause of death, which is a categorical variable. with respect to cancer, the least of the five causes, all 4 have additional percent of death as evaluated by the estimates. The results give statisically significant estimates . >>>>>>> 2e142df512c521796baa4d407ffc2f6d3065ae49

cod_lm_yz1 = lm(percent_excess_death ~cause_of_death, data = cod_data)
broom::tidy(cod_lm_yz1)
##                                              term estimate  std.error
## 1                                     (Intercept) 21.60686 0.08399078
## 2 cause_of_deathChronic Lower Respiratory Disease 21.08880 0.12091802
## 3                     cause_of_deathHeart Disease 10.89773 0.11879905
## 4                            cause_of_deathStroke 12.92196 0.12046778
## 5              cause_of_deathUnintentional Injury 25.77100 0.11875825
##   statistic p.value
## 1 257.25277       0
## 2 174.40579       0
## 3  91.73246       0
## 4 107.26483       0
## 5 217.00389       0
>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5

Map

map_cod_data = cod_data %>%
  filter(locality == "Metropolitan") %>%
  select(state, locality, percent_excess_death) %>% 
  group_by(state) %>% 
  summarise(mean_ped = mean(percent_excess_death)) %>% 
  dplyr::filter(!(state == "District of\nColumbia"))
  

map = as.tibble(fifty_states) %>%
  group_by(id) %>% 
  summarize(clong = mean(long), clat = mean(lat)) %>% 
  filter(!(id == "district of columbia"))

df <- cbind(map, state.abb, state.center, rate = unique(map_cod_data$mean_ped))

ggplot(df, aes(map_id = id)) + 
    geom_map(aes(fill = rate), map = fifty_states) + 
    expand_limits(x = fifty_states$long, y = fifty_states$lat) + 
    labs(x = "", y = "") +
    theme(panel.background = element_blank(), 
          axis.text.x = element_blank(), 
          axis.text.y = element_blank(), 
          axis.ticks = element_blank()) + 
    geom_text(aes(x = clong, y = clat, label = state.abb)) +
  scale_fill_gradient(low="gold", high="red")

Here we have generated a map of the United States bases with color gradient showing the mean percent of excess death rate regardless of the locality. Here is a Shiny link for a closer look of the national distribution of mean percent excess death varied by locality – metropolitan and nonmetropolitan areas. We observe for both metropolitan and nonmetropolitan areas, the Southeast region appears to have significantly high mean percent of excess death rate than the others, but the rural-urban gap is not apparent. This can be explained by the geographic features of the Southeast region.

<<<<<<< HEAD

If we compare this map with the poverty map of the US, we see that states with higher mean percent excess death is more likely to be in the poverty area.

=======


If we compare this map with the poverty map of the US, we see that states with higher mean percent excess death is more likely to be in the poverty area.

>>>>>>> 667868adab929cae16d0a55c63183dce21b1a2c5

Conclusion

Notes